TRANSMUT‐Spark: Transformation mutation for Apache Spark
نویسندگان
چکیده
Summary This paper proposes TRANSMUT‐Spark for automating mutation testing of big data processing code within Spark programs. Apache is an engine analytics/processing that hides the inherent complexity parallel programming. Nonetheless, programmers must cleverly combine built‐in functions programs and guide to use right management strategies exploit computational resources required by avoid substantial production losses. Many programming details in are prone false statements be correctly automatically tested. explores application programs, a fault‐based technique relies on fault simulation evaluate design test sets. The introduces most laborious steps process fully executing process. describes how automates mutant generation, execution adequacy analysis phases testing. It also discusses results experiments validate tool argues its scope limitations.
منابع مشابه
Approximate Stream Analytics in Apache Flink and Apache Spark Streaming
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...
متن کاملMLlib: Machine Learning in Apache Spark
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...
متن کاملModeling and Simulating Apache Spark Streaming Applications
Stream processing systems are used to analyze big data streams with low latency. The performance in terms of response time and throughput is crucial to ensure all arriving data are processed in time. This depends on various factors such as the complexity of used algorithms and configurations of such distributed systems and applications. To ensure a desired system behavior, performance evaluatio...
متن کاملBalanced Graph Partitioning with Apache Spark
A significant part of the data produced every day by online services is structured as a graph. Therefore, there is the need for efficient processing and analysis solutions for large scale graphs. Among the others, the balanced graph partitioning is a well known NP-complete problem with a wide range of applications. Several solutions have been proposed so far, however most of the existing state-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Software Testing, Verification & Reliability
سال: 2022
ISSN: ['1099-1689', '0960-0833']
DOI: https://doi.org/10.1002/stvr.1809